Partial Parsing: Combining Choice with Commitment

نویسنده

  • Malcolm Wallace
چکیده

Parser combinators, often monadic, are a venerable and widelyused solution to read data from some external format. However, the capability to return a partial parse has, until now, been largely missing. When only a small portion of the entire data is desired, it has been necessary either to parse the entire input in any case, or to break up the grammar into smaller pieces and move some work outside the world of combinators. This paper presents a technique for mixing lazy, demand-driven, parsing with strict parsing, all within the same set of combinators. The grammar specification remains complete and unbroken, yet only sufficient input is consumed to satisfy the result demanded. It is built on a combination of applicative and monadic parsers. Monadic parsing alone is insufficient to allow a choice operator to coexist with the early commitment needed for lazy results. Applicative parsing alone can give partial results, but does not permit context-sensitive grammars. But used together, we gain both partiality and a flexible ease of use. Performance results demonstrate that partial parsing is often faster and more space-efficient than strict parsing, but never worse. The trade-off is that partiality has consequences when dealing with ill-formed input.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Committee-based Decision Making in Probabiiistic Partial Parsing

This paper explores two directions ibr the next step beyond the state of the art of statistical parsing: probabilistic partial parsing and committee-based decision making. Probabilistic partial parsing is a probabilistic extension of the existing notion of partial parsing~ which enables fine-grained arbitrary choice on the tradeoff between accuracy and coverage. Committeebased decision making i...

متن کامل

Towards a Reduced Commitment , D - Theory Style TAGParserJohn Chen

Many traditional TAG parsers handle ambiguity by considering all of the possible choices as they unfold during parsing. In contrast, D-theory parsers cope with ambiguity by using underspeciied descriptions of trees. This paper introduces a novel approach to parsing TAG, namely one that explores how D-theoretic notions may be applied to TAG parsing. Combining the D-theoretic approach to TAG pars...

متن کامل

Robust stochastic parsing: Comparing and combining two approaches for processing extra-grammatical sentences

This paper compares two techniques for robust parsing of extragrammatical natural language. Both are based on well-known approaches; one selects the optimal combination of partial analyses, the other relaxes grammar rules. Both techniques use a stochastic parser to select the “best” solution among multiple analyses. Experimental results show that regardless of the grammar, the best results are ...

متن کامل

Explanation-Based Learning of Partial-Parsers

This paper presents a method for learning eecient parsers of natural language. The method consists of an Explanation-Based Learning (EBL) algorithm for learning partial-parsers, and a parsing algorithm which combines partial-parsers with existing \full-parsers". The learned partial-parsers, implementable as Cascades of Finite State Transducers (CFSTs), recognize and combine constituents eecient...

متن کامل

Workshop Notes of the ECML / MLnet Workshop on Empirical Learning of Natural Language Processing Tasks

This paper presents a method for learning eecient parsers of natural language. The method consists of an Explanation-Based Learning (EBL) algorithm for learning partial-parsers, and a parsing algorithm which combines partial-parsers with existing \full-parsers". The learned partial-parsers, implementable as Cascades of Finite State Transducers (CFSTs), recognize and combine constituents eecient...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007